## Warning: package 'gganimate' was built under R version 3.5.2
Somewhere along the way in a math class in elementary, middle or high school, you may have encountered the idea of graphing lines and thinking about the equations that represent those lines.
Let’s think about this idea using the hypothetical example of several runners.
The orange line represents a runner who runs 6 miles every hour. In 2 hours, this person will have run 12 miles, and if that runner continued for 3 hours of running, they would run 18 miles.
We can write an equation that represents the distance run in several equivalent ways:
\(distance = 6 * time\)
\(miles = 6 * hours\)
\(y = 6* x\)
In this case, the runner’s speed (6 miles per hour) is what we call the slope of the line. The orange runner is getting 6 miles of distance for every hour spent running. Economists sometimes talk about this idea as the “rate of return”: For every hour of running, the orange runner gets 6 miles of distance.
Imagine now two other runners, represented by a red line and a blue line.
The red runner starts at the same place as the orange runner, but runs at a slower 3 mile per hour pace. We can say that the slope of the red runner’s line is flatter than the line for the orange runner. In fact, this slope is 3 miles per hour.
The blue runner’s situation is somewhat different. The blue runner runs at the same speed as the orange runner. We can say that their lines on the graph have the same slope.
But after two hours of running, the blue runner is further along because the blue runner started 2 miles ahead of the orange runner. We need a new term to describe this idea. In graphing, and in statistics, we say that the blue runner’s line intercepts the y axis at a higher point than the line for the orange runner. Put another way, the blue runner has a higher y-intercept than the orange runner.
These two concepts of the slope and the y-intercept are the foundations of the idea of regression.
Let’s stick with the hypothetical example of runners, but now let’s imagine a slightly different situation. Imagine that we have data on how far several different runners have run, and we want to find the average speed of these runners. (you could also think this as the average rate of change of distance over time.)
## Warning: package 'haven' was built under R version 3.5.2
I want to draw the line that best fits these data to get a sense of on average, how fast runners run.
My guess is drawn as a blue line.
I could even make a guess about the slope which represents the average speed: How far does the distance go up for every mile that is run, on average?